49 - NHR PerfLab Seminar 2023-04-18: Conquering Noise With Hardware Counters on HPC Systems [ID:47943]
50 von 556 angezeigt

This is a corporate work between the Technical University of Darmstadt and the Forschungszentrum

in Jülich.

You can see that Markus, Alexander and me and Felix are from the Technical University

of Darmstadt, while Nour and Bernd are from the Forschungszentrum Jülich.

To be precise, we actually presented part of this work at the Pro Tools workshop during

the SC22 in Dallas last year.

And what you will see here is basically a bit more extended version and a few more aspects

about the future work.

So let's start with motivation.

So as you know, the performance and complexity of HPC systems is becoming more and more complex

and the applications are becoming also more complex.

So it's basically very important to identify performance bottleneck at an early stage.

And this is basically our motivation.

So what you usually do is that you use performance modeling, which has quite a long research

history.

And it's actually very good to predict the scaling behavior of an application and thus

allows you to identify performance bottleneck at an early stage.

But the problem is that if you perform or the performance models basically depend on

the measurements and on noisy environments, you basically have very much noise in the

measurements.

So you have strong variations in the measurements and the measurements also become irreducible

and misleading.

So you can think if you use these measurements to generate performance models, the performance

would basically deviate strongly from what the real behavior of the application is.

So this is something we don't want.

And what we actually want is a performance model that actually describes the scaling

behavior of the application.

So let's look a bit more into this topic.

So you can see a graph.

Don't worry about it right now.

I will explain it in detail at the end of the presentation.

But what you can see is on the x-axis, you have the relative deviation from the mean.

And here on the y-axis, you have, for example, here the time.

And how you can think about it is that we had repeated runs several times and then a

kind of deviation from the mean.

And the further away these values are, the more deviations there is actually.

And you can see here two kinds of plots.

You can see a blue one and an orange one here.

And what is interesting is that the blue one actually is very short, as you can see here,

while the orange one has quite some deviation.

And this means that it's basically, so the metric to time is basically, which is here

on the y-axis, strongly affected by noise.

So you can think about if you use this metric to generate performance models, you can see

that there will be a lot of deviations.

So what we thought of is why don't we use hardware counters?

Because hardware counters are basically a little impacted by noise.

And this is something you can see here also on the y-axis.

So you have here the double precision operations.

And very interesting to see here is that in the presence of noise or no noise, it's basically

nearly the same.

Teil einer Videoserie :
Teil eines Kapitels:
NHR@FAU PerfLab Seminar

Zugänglich über

Offener Zugang

Dauer

00:37:18 Min

Aufnahmedatum

2023-04-18

Hochgeladen am

2023-04-21 16:46:05

Sprache

en-US

Speaker: Ahmad Tarraf,  Technical University of Darmstadt, Laboratory for Parallel Programming
Title: Conquering Noise With Hardware Counters on HPC Systems
Date and time: Tuesday, April 18, 2 p.m. – 3 p.m.
Abstract:
With increasing system performance and complexity, it is becoming increasingly crucial to examine the scaling behavior of an application and thus determine performance bottlenecks at early stages. Unfortunately, modeling this trend is a challenging task in the presence of noise, as the measurements can become irreproducible and misleading, thus resulting in strong deviations from the actual behavior. While noise impacts the application runtime, it has little to no effect on some hardware counters like floating-point operations. However, selecting the appropriate counters for performance modeling demands some investigation. In this paper, we perform a noise analysis on various hardware counters. Using our noise generator, we add additional noise on top of the system noise to inspect the counters’ variability. We perform the analysis on five systems with three applications in the presence of various noise patterns and categorize the counters across the systems according to their noise resilience.
Short bio:
Ahmad Tarraf has been working since 2021 as a postdoc at the Technical University of Darmstadt in the laboratory for parallel programming. Dr. Tarraf received his B.Sc. degree in Mechatronics Engineering from RHU Lebanon in 2013 and his M.Sc.degree in Mechatronics Engineering with the specialization simulation and control in 2016 from the Technical University of Darmstadt. From 2017, he was a research assistant at the Institute of Computer Science at the University of Frankfurt in the research field formal abstraction and verification of analog mixed circuits. He later received his doctoral degree (Dr. rer. nat.) Magna Cum Laude in Computer Science from the University of Frankfurt in early 2021. His research interest includes High-performance computing, efficient file and storage systems, behavioral modeling, machine learning, formal verification, symbolic analysis, analog mixed-signal design, cybernetics, and robotics. Dr. Tarraf has been involved in several national research projects and is currently involved in the two EuroHPC projects ADMIRE and DEEP-SEA.
For a list of past and upcoming NHR PerfLab seminar events, see: https://hpc.fau.de/research/nhr-perflab-seminar-series/
Einbetten
Wordpress FAU Plugin
iFrame
Teilen